1 Data sources

#load case data from the csv
covid_case_data <- read_csv(here::here("data/NCOV_cases_by_postcode_LGA.csv"))
#filter the data for the investigation period
data <- covid_case_data %>% filter(diagnosis_date >= "2020-06-01",
                             diagnosis_date <= "2020-08-13") 

1. Where the Data was found

The data was sourced from the Victoria Health Coronavirus website, which stores the data of all the coronavirus cases throughout Victoria since 2020. They have data sets available for outsiders to download and analyze, as well as post updated information frequently.

Of the data sets available for download the data chosen for this analysis was Victorian case data that contains “All data of PCR and RAT cases by Local Government Area and postcode”. This is primarily because this data set contains more variables thereby giving more flexibility and control for the analysis and results. It will help bolster the objective of this analysis and help improve understanding of the results provided.

This data is protected by copyright under a Creative Commons Attribution Version 4.0, which is an international license. Therefore, when using this data we must be mindful of the restrictions laid out in the aforementioned license.

2. Observational or Experimental Data

This is a observational data as the Victoria Health department specified that they gathered their data through contact tracing and management of the COVID-19 outbreak. This is indicative that researcher or health professionals do not control the factors affecting the variables, meaning that the variables themselves are independent, but rather they attempt to determine correlation of factors. In this scenario the independent variables are the coronavirus cases or people affected by the coronavirus.

3. Codebook - This can be found in the data folder.

4. Unit and potential Unique Identifier

The unit of this data is considered to be the number of covid cases. In this data set the “postcode” can be considered a unique identifier as it can easily identify an individual especially when combined with the diagnosis date and case count.

5. Storing and Analysing the Data

The data set in its current form can be stored and used for analysis of the research question, however the only variables essential to this analysis is the date of diagnosis and the number of cases on those days. Therefore, to keep the report concise and accurate we will remove all columns except “diagnosis_date” as well as create a new column “total_cases” to give the total number of cases on a given day within the period of interest.

cleaned_data <- data %>% 
  group_by(diagnosis_date)%>%
  summarise(total_cases=n())

2 🔍 Analysis

6. Daily Cases over Time of Interest

intervention_dates<- c("2020-06-30", "2020-07-23") #store intervention dates in a variable

#do the following to highlight the intervention dates in plot
cleaned_data$type_of_date<- 
  ifelse(cleaned_data$diagnosis_date == intervention_dates,
         "date_of_intervention","other") 

mycolour<- c("date_of_intervention" = "red", "other" = "grey")

#plot on a graph  
cleaned_data %>%
  ggplot(aes(x= diagnosis_date, y= total_cases))+
  geom_line()+
  
  #add code to highlight specific dates
  geom_point(aes(colour = type_of_date))+
  
  labs(x= "Diagnosis date", y = "Number of cases")
Time series of Covid cases from 30th June to the 13th of September 2020

Figure 2.1: Time series of Covid cases from 30th June to the 13th of September 2020

From this figure we see that the overall trend for the number of cases diagnosed with covid-19 is increasing from the 1st of June to the 13th of September. The 2 point highlighted in red represent the intervention dates in question, the “30th of June 2020” and the “23rd of July” respectively.

7. The 7-Day Growth Rate

lockdown_period <- interval(ymd("20200624"), ymd("20200729"))

growth_rate <- data %>%
 filter(diagnosis_date %within% lockdown_period) %>% #limit it to the lock down period
  group_by(diagnosis_date)%>%
  summarise(total_cases=n())%>%
  mutate(
 growth_rate=slider::slide_dbl(
 .x = total_cases,
 .f = function(v) {
 log(v[7]/v[1]) / 7 #Seven-day growth rate
 },
 .before=8, #consider only the previous values
 .after = 0
 )
 )
#do the same as in previous figure to highlight the intervention dates

growth_rate$type_of_date <- 
  ifelse(growth_rate$diagnosis_date == intervention_dates,
         "date_of_intervention","other") 

mycolour<- c("date_of_intervention" = "red", "other" = "grey")

ggplot(growth_rate,aes(diagnosis_date, growth_rate))+
  geom_line()+
  geom_point(aes(colour = type_of_date))
Growth Rate of Covid cases within the lockdown period of 30th June to 29th July 2020

Figure 2.2: Growth Rate of Covid cases within the lockdown period of 30th June to 29th July 2020

Since we are concerned with effects of local lockdowns, the above growth rate considers the lockdown duration within the investigation period. According to Davey (2020), the lockdown at that point in time was imposed from the 30th of June to the 29th of July 2020. The growth rate is calculated on a weekly basis, meaning that it is calculated using the number of cases 7 days prior to each given date. The two dates of intervention are highlighted in red.

The first thing to note from 2.2 is that the growth rate on the 30th of June displays the growth rate of covid cases for the week prior to the lockdown being imposed. This point can considered a reference point to understand the effectiveness of the lockdown.

We can understand that despite there being an initial spike in the growth rate of cases, it subsided towards the end of the lockdown. It can be seen that the growth rate on the second date of intervention, “23rd July”, was lower compared to the previously calculated growth rate. Although not the lowest, the growth rate on the 23rd of July was one of the lowest calculated upto that point and is considerably lower than other dates throughout the lockdown period. This means the lockdown was relatively effective in this period.

8. Relationship between Growth Rate and Effective Transmission Rate

First let us identify what these two terms mean:

The growth rate, in the context of the corona virus in this case, refers to rate at which the number of covid positive cases increase or decrease. While the effective transmission number refers to the average number of people that can be infected by a single person or the number of new infections an individual can cause.

Growth Rate = g

Effective Transmission Number = R(0)

Average duration of infectiousness = T

It follows this equation:

g = R(0) - 1/ T

The reason the equation above subtracts one is to take out the individual or person infecting others out of the equation. The average duration of infectiousness(T), refers to amount of time a person can transmit or pass covid upon contracting it.

Now lets understand the relationship between growth rate and R(0):

R(0)>1 refers to an individual that is passing on covid to another, meaning that the growth rate will be positive as “anything greater than 1 - 1 = positive value” and time can never be negative. As such, growth rate will increase no matter the magnitude.

On the other hand, the converse is also true. When R(0)<1, it means that an individual is less likely to pass on covid to another meaning that growth rate will be negative. This is because again “anything between 0 and 1 - 1 = negative value”, and time(T) can never be negative. Therefore, growth rate will decrease regardless of magnitude and will eventually end at 0.

In class the formula presented was:

Effective transmission number = e^rt, where r= growth rate and t=interval of time

This is a more realistic model for a number of reasons,

9. Estimating R(0) and Change in R(0) due to intervention

Let us consider the SEIR model for coronavirus specifically. The abbreviation SEIR stands for:

S - People that are Susceptible

E - People that are Exposed but not infectious

I - People that are Infected and infectious

R - People that have Recovered

This model is used to calculate the number of people affected by covid in a given region or area. It helps determine and understand how contagious the virus is or can be overtime as well as the recovery rate and infectious periods, which then assist officials in making informed decisions to combat the virus, in this context covid-19.

It is understood that S+E+I+R=1. This means that the summation of the number of people in these 4 categories make up the entire population of the specific region the SEIR model is being carried out in. It is make clear that all members of the population are considered or accounted for in the computation of this model. Additionally, it ensures that no individual is duplicated. For example, once a person recovers from the coronovirus, they are no longer accounted for in the I-Infected and infectious category or any of the other categories.

Since we are interested in the R(0) of the intervention dates, the SEIR model will be calculated using the growth rate on those 2 particular dates. That is the growth rate on the 30th of June and on the 23rd of July 2020. From then on we can compute the effective transimission number and other required values.

times <- seq(int_start(lockdown_period), 
                   int_end(lockdown_period), by = "1 day")
parameters <- c(
 "beta" = 2 / 3,
 "sigma" = 1 / 5,
 "gamma" = 1 / 6 # Gives an R0 (beta/gamma) of 4
) 
population_vic <- 6.681e6 #population in Victoria as at September 2020

initial_infected <- 10
initial_state <- c("S" = population_vic - initial_infected, 
                   "E" = 0, 
                   "I" = initial_infected, 
                   "R" = 0, 
                   "incidence" =0)

According to an article in Health Direct (2023), a person from about 48 hours before symptoms or around 72 hours before symptoms in high risk situations. Additionally, people can be infectious up to 10 days while already having covid. Therefore, finding the average between the two extremes which is 3 days prior and 10 days, I assumed gamma to be 6 as a rounded figure.

covid_cases_intervention_dates <- function(t, state, parameters) {
 with(as.list(c(state, parameters)), { 
   
 population_vic <- S + E + I + R 
 dSdt <- -beta * S * I / population_vic
 dEdt <- beta * S * I / population_vic - sigma * E
 dIdt <- sigma * E - gamma * I
 dRdt <- gamma * I
 dIncidencedt <- sigma * E
 return(list(
 c(dSdt, dEdt, dIdt, dRdt, dIncidencedt) 
 ))
 })
}
initial_assumption <- ode(
 y = initial_state,
 times = as.numeric(difftime(times, times[1], units = "days")),
 func = covid_cases_intervention_dates,
 parms = parameters
)

view(initial_assumption)

3 📉 Data curation

The criteria I am following in this section is the one laid out by Broman and Woo (2018).

  1. The first thing is to be consistent with the organization of your data. In this Covid-19 data set the variables consistently used were “diagnosed_date” and “total_cases”.

  2. It is important to choose good and meaningful names for variable and file names. In this data set “diagnosed_data” represents the date a patient was diagnosed to be positive with Covid-19, while “total_cases” represents the total number of cases diagnosed on each date.

  3. When writing dates it is best to do so in the universal format YYYY-MM-DD, which has been followed in this analysis as seen in the “diagnosis_date” variable.

  4. In terms of putting a single item in a cell, it has been followed in this analysis where each variable has its own respective cell containing information only pertaining to that specific variable. Putting a single item in a cell is generally meant to avoid errors when having to separate items in a cell, which is not seen in this codebook.

  5. None of the cells have been left empty. If an something did not relate to a certain variable, the cell was filled with ‘NA’. For example, “diagnosis_date” does not in itself have a unit and therefore this cell was filled with NA in the codebook.

  6. Calculations have not been included in the data file, rather the data has been kept in its raw “.csv” format. Another way to identify that this concept was followed is from the object “covid_case_data”. This object has been transformed and stored in other objects but the original “covid_case_data” object has not been altered but left in its raw state.

  7. A data dictionary has been created. It contains the variable names of the filtered data that was used for the analysis. It can be found as a separate file “Codebook”.

  8. This analysis was backed up onto git hub to protect this folder should anything happen. Attached is the link to the git hub repository backup.

  9. The data has been saved in a plain “.csv” format so as to be easily readable by any machine or program. This makes easier to read the data into R for this analysis.

  10. The data was highlighted nor was the font color changed. It has been kept in its raw format for this analysis.

  11. One validation method was to ensure that there are no negative values for number of cases. This is due to the fact that number of cases considers people and therefore cannot be negative.

Resources

Broman, Karl W., and Kara H. Woo. 2018. “Data Organization in Spreadsheets.” The American Statistician 72 (1): 2–10. https://doi.org/10.1080/00031305.2017.1375989.
Davey, Melissa. 2020. “Melbourne Suburbs Lockdown Announced as Victoria Battles Coronavirus Outbreaks.” The Guardian. https://www.theguardian.com/australia-news/2020/jun/30/melbourne-hotspot-lockdowns-announced-as-victoria-battles-coronavirus-outbreaks.
Health Direct. 2023. “Melbourne Suburbs Lockdown Announced as Victoria Battles Coronavirus Outbreaks.” https://www.healthdirect.gov.au/covid-19/recovery-and-returning-to-normal-activities.
Karline Soetaert, Thomas Petzoldt, and R. Woodrow Setzer. 2010. “Solving Differential Equations in R: Package deSolve.” Journal of Statistical Software 33 (9): 1–25. https://doi.org/10.18637/jss.v033.i09.
Spinu, Vitalie, Garrett Grolemund, and Hadley Wickham. 2023. Lubridate: Make Dealing with Dates a Little Easier. https://CRAN.R-project.org/package=lubridate.
Victoria Department of Health. 2023. “Victorian COVID-19 Data.” https://www.coronavirus.vic.gov.au/victorian-coronavirus-covid-19-data.
Wickham, Hadley. 2023. Tidyverse: Easily Install and Load the Tidyverse. https://CRAN.R-project.org/package=tidyverse.
Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, Kara Woo, Hiroaki Yutani, and Dewey Dunnington. 2023. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics. https://CRAN.R-project.org/package=ggplot2.
Xie, Yihui. 2023. Bookdown: Authoring Books and Technical Documents with r Markdown. https://CRAN.R-project.org/package=bookdown.